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ABSTRACT 


We study the problem of partitioning a class of N students 
into k groups of n students each (VN = k x n), such that 
their learning from peer interactions is maximized. In our 
formalization of the problem, any student is able to increase 
his score in the subject the class is studying up to the score 
of the student who is at p-percentile among his higher ability 
peers. In contrast, the past work presumed that only stu- 
dents with score below the group mean may increase their 
score. We give a partitioning algorithm that maximizes to- 
tal gain summed over all the students for any value of p such 
that 100/(100—p) is integer valued. The time complexity of 
the proposed algorithm is only O(N log N). We also present 
experimental results using real-life data that show the supe- 
riority of the proposed algorithm over current strategies. 


1. INTRODUCTION 


A basic problem that has challenged educators for a long 
time is how to group students in a class in order to supple- 
ment their learning from the teacher with the learning from 
peers [6, 11]. Two popular strategies currently in vogue 
are: i) heterogeneous (also called diversity-based) grouping, 
and ii) homogeneous (also referred to as stratified or ability- 
based) grouping [5]. Both have their ardent proponents. 
The results from the empirical studies on the relative effec- 
tiveness of the two are inconclusive and the public opinion 
has also been mixed [3, 9]. 


In a major departure from the conventional thinking, a com- 
putational perspective was taken to address this problem 
in [1]. However, the learning model underlying the proposed 
algorithmic approach postulated that only the below average 
students are able to increase their ability score [4]. This pa- 
per removes this limitation, recognizing that every student 
can benefit from peer interactions [6, 8]. 


1.1 Contributions 

e We admit a general learning model that specifies that any 
student is able to increase his ability score up to the level 
of the student who is at p-percentile amongst his higher 
ability peers. The value of p is an input parameter, se- 
lected by the educator. The model in [8] can be viewed as 
a special case, with p set to 100. 


e For the above learning model, we provide an algorithm 
for partitioning N students into k groups of n students 
each (N = k x n) with the goal of maximizing learning 
gain summed over all the students. We show that the 
algorithm is optimal for the values taken by p such that 
100/(100 — p) is integer-valued. Thus, it is optimal for p € 
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{99, 98, 95, 90, 80, 75, 662, 50}. The time complexity of 
the algorithm is O(N log N). 


e We present experimental results using real datasets, show- 
ing the superiority of our approach over current strategies. 


1.2 Limitations 


e Although our learning model has been abstracted from the 
findings in the education literature, a rigorous empirical 
validation of the model is future work. The insights gained 
are nonetheless instructive. 


e Teaching others and giving help has been shown to be pos- 
itively correlated to increase in learning [2]. Incorporating 
such learning gains for high ability students is future work. 


2. RELATED WORK 


The question of how to group students to maximize their 
gain from peer interactions was first addressed from a com- 
putational perspective in [1]. The authors proposed two 
functions to model learning gains. The first maximizes the 
number of students who improve their ability score [4], while 
the second incorporates the extent of these improvements. 
In both the cases, however, only the below average stu- 
dents benefit and the higher ability students have zero gain. 
The authors showed that the partitioning problem with the 
goal of maximizing the number of benefiting students is NP- 
complete, while they left open the question of the complexity 
class of the problem with the second gain function. 


The viewpoint that every student can learn from the higher 
ability peers is also present in [8]. In their model, every 
student may increase his ability to a fixed level, which is 
the ability of the highest ability student, i.e. p = 100. This 
assumption is too rigid and optimistic. In contrast, we admit 
various levels of gain for different students. 


Our problem bears resemblance with the expert-team for- 
mation problem, in which the experts are multi-dimensional 
vectors of skills and the goal is to find a team that can collec- 
tively perform a given task requiring certain skills [10]. How- 
ever, our students are described by 1-dimensional scores, and 
our objective is not to locate a single team, but to partition 
the students such that their learning gain is maximized. 


Our problem also superficially resembles the classical clus- 
tering problem [7]. However, unlike the classical clustering, 
which aims to maximize the similarity of all the points in a 
cluster to a cluster center, our problem has no one point in 
a partition with respect to which the distance of all other 
points needs to be optimized (see Fig. 1). 
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Total Learning Gain = (0+1+2+3+4+34+4+4+5+6+4+6+7) =37 

Figure 1: Computation of the potential learning gain for a group of ten students with 75-percentile chosen as the reference point. The 

it? box contains the score of the i*” student. The learning gain for each student is the difference between his score and the score of 
student at p-percentile amongst his peers having higher score than him. For the first student, the index of the student at 75-percentile 
amongst his higher ability peers is (1 + [(10 — 1) * 75/100]) = 8. Since the score of the latter is eight, the gain for the first student is 

(8 —1) =7. For the second student, the index of the student at 75-percentile amongst his higher ability peers is also 8 
(2+ [(10 — 2) * 75/100]), thus giving him a gain of (8 — 2) = 6, and so on. The gain for the last student is zero, as there is no one 
above to learn from. 


3. PROBLEM STATEMENT 


We have a class of N students. Each student 7 is associated 
with score 6; € Ro, representing student’s ability in the 
subject the class is studying [4]. For simplicity, scores are 
assumed to be distinct, so there is a one to one correspon- 
dence between the student 7 and the score 0;. Students are 
ordered in the increasing order of scores. 


Students are able to increase their score through interactions 
with peers in the group in accordance with a gain func- 
tion [12, 13]. The gain from peer learning for a group G is 
given by a function £. Our objective is to find k groups of 
n students each (N = k x n), such that the overall gain for 
students is maximized. That is, our objective is 


max yo Ee). (1) 


GEG 
The learning function is of the form 
IG 
£(G) = > (RF - 6), (2) 
i=1 


where R¢ is the reference score for the G’s i*” ranked stu- 
dent. The intuition is that each student can increase his 
score up to the reference score. 


3.1 Learning up to p-Percentile 
PROBLEM 1 
The gain function in Eq. 2 is given by 
IG| 
£°(G) => (pf - 6F), (3) 
i=1 
where pi is the score of the student whose score is at the p- 
percentile position of the scores of the students having higher 
score than the i'” student in G. 


For a given set of scores, the p-percentile score is the score 
below which p% of scores fall. To find the p-percentile 
score, the corresponding index is calculated first, which is 
[np/100]. The value at this index then is the p-percentile 
score. Thus, 

p-percentile(01,02,...,@n) = 9fn.p/100)- (4) 


Fig. 1 graphically illustrates the percentile gain function. 


(P-PERCENTILE PARTITIONING PROBLEM). 


4. SOLUTION 

THEOREM 1. For values of p such that p/(100 — p) is 
integer-valued, the p-Percentile Partitioning problem can be 
solved optimally in O(N log N) time. 


We shall prove the theorem constructively by providing an 
optimal algorithm whose time complexity is O(N log N). It 
is named Percentile_Partitions and its pseudo-code is shown 
in Algorithm 1. The algorithm exploits the special structure 
of our problem that we elicit next. 


We first expand the equation for learning gain w.r.t. p- 
percentile as given in Eq. 3 into 


L(G) = (p-percentile(0¥, 0g, ..., 0°) — of) + 
(p-percentile(6%, 6¢, bees 0°) - 65’) + 
+ (p-percentile(0y’) _ On 1) ‘ 


Using the definition of p-percentile from Eq. 4, the above 
can be written as 


LP(G) = (fk p~m—1)p/1001 97) + (054 t(n—2)p/1001 08) + 
+ (6g — 0% 4). 


To this we add the term (09 — 0%) corresponding to zero 
gain of the n* student. Thus, we have 


LP(G) = (fe p~—1)p/1001 97) + (654 tn —2)p 1001-92) + 
weet (06, -1)+1p/100) —6%_1) ls (0-08). 


Collecting the positive and negative terms together, we get 
G G 
LP(G) = (Ff. tn—nyp/100) + 09°, (n—2)p/100) + --- 


+ O6.-1)+1p/100) + On) 
(0° pe dh, kgs a an) , 
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which can be written succinctly as 


LEG) = So Of tn—ap/100) — > OF. (5) 
i=1 i=l 


Using this equation, our objective becomes 


max > | (>: OF. [(n—#)p/100] — yo" 
i=1 


Geg \i=1 


The second component in the above sum is constant for any 
given set of ability scores. Therefore, our objective can be 
simplified to 


G 
max De Fh 1n—ap/2007- (6) 
GEeg i=1 


LEMMA 1. Given p € [0,100] and an ascending sequence 
of 6; € Rso, for (100—p)|100, Soi", 9:+[(n—ap/100] 18 equiv- 
alent to 7, Yi 0i, where 


100 if [ 2 i 
100—p? if [Tol<isn 
4 100 “re 100 ; — [me 
N= mod(n, imp) if 100—p {n and i= [eel 
0, otherwise. 


ProoF. It is to be noted that a student at index 7 im- 
proves up to the score of student at index i+ [(n—7)p/100]. 
As the student indexes are traversed from the higher-score 
end to the lower end, with unit decrease in value of i, the 
quantity [(m — i)p/100] increments by unity, except for the 
values of 7 for which (n—7)p is a multiple of hundred. In the 
latter case, although there is a decrement in the value of 7 
by one, the value of [(n —7)p/100] stays the same as that of 
[(n —i—1)p/100], causing the index up to which students 
are improving to decrement by one. It is easy to derive that 
this process repeats itself after a period of 100/(100 — p). 
Further, when n is not a multiple of the above period, there 
will be mod(n, 100/(100—p)) students who will be improving 
up to the smallest index value. For the remaining students, 
as no other student improves up to their score, a 7 value of 
zero is straightforward. 

EXAMPLE 1. In Fig. 1, we have n = 10 and p = 75. 
Thus, in accordance with Lemma 1, we have 


4, if8<i<10 
ifi=8 
0, otherwise. 


The above may also be verified visually from Fig. 1. It is easy 
to note that the students at 7”, 8, and 9% index improve 
up to the score of the 10" student, while the 10” student 
with zero gain remains at the same score. This makes the 
score of the 10"” student visible four times in the updated 
scores, leading to the y value of four. Similarly, the score 
of the student at 9°" index is also visible four times because 
of students at 3, 4, 5, and 6” indexes improving up to 
his score. On the other hand, only students at 1% and 2"4 
indexes improve up to the score of 8°" student. Hence, a 
value of two for the 8" student. No one is improving his 
score up to the score of any of the students at index below 
eight. So, the y values corresponding to them are zero. 


Unfortunately, when (100—p) { 100, the coefficients 7;’s have 
complex structure and we defer their study to future work. 
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Algorithm 1 (Percentile_Partitions) Optimal Partitioning 
for maximizing Learning Gain - learning up to p-percentile 


1: Input: Distinct descending scores {01,62,..., 0}, Per- 
centile p, Number of groups k, Size of each partition n, 
kxn=N. 


2: Gi Go ed Gi 
3: m € 100/(100 — p) 
4: q< |n/m| 

5: @< [n/m] 
6 
7 
8 


: if mod(n,m) 40 
7 Me {9kqt15+++;9kq+k) } 
for i € {1,2,...,k} 


9: Gi-GUM 
10: end for 
11: end if 


12: Hgtovat  {91, 02,-.-, Oxq} 
eR F2gtobal — {Oxg41,- ies ,On-1, On} 
14: for i € {1,2,...,k} 


1s Apart <- randomly sample q scores from 
A 1giobai without replacement. 
16: H2part <_ randomly sample (n — q) scores from 


2 gtobai Without replacement. 
17: Gi «+ Gi U Al part U F2part 
18: end for 
19: return {Gi,G2,...,Gx} 


4.1 Percentile_Partitions 

Lemma 1 leads to our optimal partitioning algorithm, which 
is shown in Algorithm 1. The algorithm first divides the 
input ability scores into two or three sets depending on 
whether mod(n, 100/(100 — p)) is zero or not respectively. 
The first set H1gicbai consists of scores that contribute by 
a factor of 100/(100 — p) to the learning gain. The second 
set M if present, consists of scores that contribute by a fac- 
tor of mod(n, 100/(100 — p)). Finally, the third set H2g1oba1 
consists of scores that have zero contribution. These sets 
correspond to the three different values of the y coefficients. 
They are such that H1giobar = M = A2giobat, where A = B 
means all elements of set A are greater or equal compared 
to any element of set B. For each of these sets then, the 
algorithm creates k equal random partitions. These parti- 
tions are then merged to create the final k partitions. The 
example below illustrates the algorithm. 


EXAMPLE 2. Consider a set of 20 students with ability 
scores {01,02,...,020}, sorted in the descending order. The 
set is to be partitioned into four groups, each containing five 
students. Each student can learn up to the score of the stu- 
dent who is at 662 -percentile of students above. 


For p = 662 andn = 5, we have m = 3, q=1, and g =2. 
The algorithm breaks the scores into three sets: 

1 gtobat = {91, 92, 63, 64} 

M {65,06,07, 0s} 

FH 2glovat = {99, 910, O11, O12, 13, O14, 915, O16, 917, O18, O19, 920 } 


For each set, four equal-sized random partitions are created, 
which are then merged to create four groups: 

Gi = {63} U {6} U {417, O10, O15} 

G2 = {0:1} U {97} U {619, O16, 49} 

= {62} U {45} U {613, 418, O12} 

{04} U {48} U {014, 020, O11} 


AQ 
a w 
| 
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Note: There are many equally good ways of partitioning 
A gtobar, M, and H2giobat. The above is just one of them. 


4.2 Proof of Theorem 1 

Clearly, if the input scores were already in the descending 
order, the time complexity of the Algorithm 1 is O(N). If 
the input scores were unsorted, then the extra sorting step 
would make the complexity O(N log N). 


The optimality of the algorithm follows from the structure 
in the values taken by the coefficient y’s. Before proceeding 
further, we state the following lemma: 


LEMMA 2. For given ordered sets of real numbers, A = 
{ai,a2, ..., Qn} and B = {bi,be,..., bn}, the quantity 
Dacapep 0, $.t. eacha € A andb € B is used exactly 
once, is maximized if the elements are chosen in a manner 
such that the product of elements at the same index from A 
and B is taken. 


Now, according to Lemma 1, 7; can take only one of the 
three values and they have ordering amongst them given by 
100/(100 — p) > mod(n, 100/(100 — p) > 0. The partitions 
created by the algorithm satisfy, H1giobar = M = H2giobat- 
Thus, in light of Lemma 2, it is easy to observe that our ob- 
jective is maximized as the set of students with higher(lower) 
scores get mapped to highest (lowest) coefficient. Moreover, 
the random perturbations within H1gioba1, M, or H2globai 
do not affect the gain value as all the scores from a set are 
involved in product with the same y value. 


5. EXPERIMENTS 


5.1 Datasets 

1. SSC Scores (Normal distribution): Staff Selection 
Commission - Combined Graduate Level Examination (SSC- 
CGL) is conducted all across India to recruit employees for 
various departments of Government of India. The scores of 
candidates for the 2016 examination, categorized into differ- 
ent regions of the country, are available at ssc.nic.in. The 
distribution of scores in every region is close to normal. We 
took the scores from the North Western (SSC-NWR) region 
that exhibits the largest variance. 


2. GATE Scores (Log-Normal distribution): In In- 
dia, Graduate Aptitude Test in Engineering (GATE) is con- 
ducted every year to test the competency of undergradu- 
ate students in various engineering disciplines. We took 
the available scores from year 2016. We experimented with 
scores from Mech. (GATE-ME), with largest variance. 


3. StkXchg UpVotes (Pareto distribution): On the 
Stack Exchange platform, users can ask and answer ques- 
tions on various topics. Additionally, they can up-vote or 
down-vote a question. The number of up-votes a user re- 
ceives is an indicative measure of his level of expertise. Pareto 
distribution fitted the data for the active users having at 
least one up-vote. The Stack Exchange data dump is avail- 
able from archive.org/details/stackexchange. We take data 
for Stack Overflow that ehibits lowest skew in distribution. 


5.2 Algorithms 


In addition to Percentile_Partitions, we consider two algo- 
rithms that correspond to the strategies currently prevalent 
in practice: Stratified and Random. 


1. Stratified: This algorithm puts in each group those stu- 
dents who exhibit similar ability. This grouping represents 
the practice of homogeneous or ability-based grouping. 


2. Random: Students are assigned to groups randomly. 
This method corresponds to the practice of heterogeneous 
or diversity-based grouping. 


5.3 Set Up 


We conducted our experiments setting the number of stu- 
dents, N, to 1024. We varied the number of groups, k, over 
{2,4,8,..., 512}, and the reference percentile point p over 
{50, 665, 75, 80, 90, 95, 98, 99}. Thus, for each dataset, we 
randomly sample 1024 scores and generate the groups for 
different combinations of k and p values. In order to have 
tight confidence intervals, we repeat this exercise 30 times 
each and report average learning gain. 


For the groups generated by Percentile_Partitions, we com- 
pute learning gain using Eq. 3. When applying Stratified 
or Random to a dataset, we generate groups only once but 
compute gain using the appropriate parameter value for p. 


We also study the group structures generated by different 
algorithms. By the structure of a group, we mean the dis- 
tribution of scores in the group. Although we run each al- 
gorithm 30 times, we only show the structure of the group 
generated by the first run. 


5.4 Results 


Fig. 2 shows the learning gain as the reference percentile 
value, p, is varied for different algorithms on various datasets. 
We show the plots for three values for the number of groups, 
k € {128,32,8} (and the corresponding group sizes, n € 
{8,32, 128}). Fig. 3 shows the learning gain as the number 
of groups, k, is varied. We show the plots for two percentile 
values, p € {75,90}. Fig. 4 shows the group structures 
generated by different algorithms. We show the structures 
for groups of size, n = 8, and for the reference percentile, 
p= 75. We alert the reader that different scales have been 
used for Y-axis in Figs. 2-3 and a logarithmic scale has been 
employed for X-axis in Fig. 3 for the sake of clarity. 


We see that the overall behavior of different algorithms re- 
mains similar across different group sizes and reference per- 
centile values. Clearly, Percentile_Partitions consistently out- 
performs the other algorithms that corroborates its theoret- 
ical optimality. The following additional observations are 
noteworthy: 


e With increasing value of p, total learning gain increases 
super linearly (Fig. 2). It is because the extent of learning 
gain for each student increases. The gain plateaus for 
small groups because beyond some percentile value, all 
students improve up to the same highest ability student. 
Then, it does not matter whether the reference percentile 
is at 90 or 95. 


e The advantage of Percentile_Partitions over Random is 
more pronounced when the number of students in a group 
is in a more realistic range of 32 or less (Fig. 3). When the 
number of groups is small and each group is large, Per- 
centile_Partitions assigns very many students randomly 
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and therefore the group structure and gain produced by value of all the undivided scores. However, this pattern 


it become similar to that of Random. 


e The learning gain is worst with the stratified strategy. 
Fig. 4 shows that this strategy produces groups in which 
the students have similar scores. Therefore, the improve- 
ments from peer interactions are small. Fig. 4 also shows 
that the p-percentile value of every group produced by 
Percentile_Partitions is higher than the global p-percentile 
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is not true for Random. Some groups generated by Ran- 
dom have p-percentile to the extreme right of global p- 
percentile. The scores in between the two p-percentiles in 
such groups do not contribute to the total gain. But then 
some other groups end up having smaller scores above p- 
percentile that leads to smaller additions to the total gain. 
Hence, the superior performance of Percentile_Partitions. 
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group and there is a dot for each ability score in that group. The p-percentile score for each group is plotted in black. The vertical red 
line shows the global p-percentile score. The groups are numbered according to the order in which they are generated. Only for 
Percentile_Partitions, the p-percentile score for every group is higher than the global p-percentile value. 


6. SUMMARY 


We investigated the important educational data mining prob- 
lem of how to group students in a class to maximize their 
learning gains from peer interactions. We worked with a 
general learning gain function in which every student is able 
to increase his ability score up to the score of the student 
who is at p-percentile amongst his higher ability peers. We 
gave an algorithm which is provably optimal for maximizing 
learning gain, the value of p is such that 100/(100 — p) is 
integer valued. We also studied the performance character- 
istics of the proposed algorithm using real-life datasets that 
corroborated the theoretical analysis and showed its supe- 
riority over the current approaches. Surprisingly, the time 
complexity of optimally grouping N students using our al- 
gorithm is only O(N log N). 
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